Abusive Language Detection in Online User Content
نویسندگان
چکیده
Detection of abusive language in user generated online content has become an issue of increasing importance in recent years. Most current commercial methods make use of blacklists and regular expressions, however these measures fall short when contending with more subtle, less ham-fisted examples of hate speech. In this work, we develop a machine learning based method to detect hate speech on online user comments from two domains which outperforms a state-ofthe-art deep learning approach. We also develop a corpus of user comments annotated for abusive language, the first of its kind. Finally, we use our detection tool to analyze abusive language over time and in different settings to further enhance our knowledge of this behavior.
منابع مشابه
Do Characters Abuse More Than Words?
Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based features for detecting abusive language. In this study, we investigate the effectiveness of such features for abusive language detection in user-generated online comments, and show that such methods outperform previous state-of...
متن کاملDetection of abusive messages in an on-line community
Moderating user content in online communities is mainly performed manually, and reducing the workload through automatic methods is of great interest. The industry mainly uses basic approaches such as bad words filtering. In this article, we consider the task of automatically determining whether a message is abusive or not. This task is complex, because messages are written in a non-standardized...
متن کاملGraph-Based Features for Automatic Online Abuse Detection
While online communities have become increasingly important over the years, the moderation of user-generated content is still performed mostly manually. Automating this task is an important step in reducing the financial cost associated with moderation, but the majority of automated approaches strictly based on message content are highly vulnerable to intentional obfuscation. In this paper, we ...
متن کاملA Unified Deep Learning Architecture for Abuse Detection
Hate speech, offensive language, sexism, racism and other types of abusive behavior have become a common phenomenon in many online social media platforms. In recent years, such diverse abusive behaviors have been manifesting with increased frequency and levels of intensity. This is due to the openness and willingness of popular media platforms, such as Twitter and Facebook, to host content of s...
متن کاملImpact Of Content Features For Automatic Online Abuse Detection
Online communities have gained considerable importance in recent years due to the increasing number of people connected to the Internet. Moderating user content in online communities is mainly performed manually, and reducing the workload through automatic methods is of great financial interest for community maintainers. Often, the industry uses basic approaches such as bad words filtering and ...
متن کامل